Stabilised wavelet mellin transform: an auditory strategy for normalising sound-source size

نویسندگان

Toshio Irino

Roy D. Patterson

چکیده

We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length of the vocal tract is compressed or expanded proportionally with the same cross-area function. The compressed and dilated versions of the impulse response can be converted into the same distribution using the Mellin transform. In this paper we show that the Mellin transform can be applied to the stabilised wavelet transform that forms the basis of the Auditory Image Model (AIM) of processing in the auditory pathway. The combined processing normalises source size information and produces a new, fruitful representation of source shape information, referred to as the “Mellin Image.” This “Stabilised Wavelet-Mellin Transform” (SWMT) also provides the mathematical framework for the derivation of the gammachirp auditory filterbank and the signal synchronous analysis in AIM.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Size and Shape Information of Sound Source in an Optimal Auditory Processing Model

متن کامل

Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform

We hear vowels pronounced by men and women as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that the auditory system can extract and separate information about the size of the vocal-tract from information about its shape. The duration of the impulse response of the vocal t...

متن کامل

Sound resynthesis from Auditory Mellin Image using STRAIGHT

We propose an Auditory VOCODER to resynthesize sound from the Auditory Mellin Image which is an auditory representation that segregates the size and shape information of incoming sound. The sound resynthesis part consists of three techniques: the STRAIGHT VOCODER [2], frequency-warping cepstral analysis [4,12], and nonlinear multivariate regression analysis (MRA). We explain these methods and t...

متن کامل

The Perception of Scale in Speech

We can recognize vowel sounds regardless of whether a man, woman or child pronounces them. Such vowel normalization has proved to be a difficult task for computer models to simulate. Motivated by observations of the auditory system Irino and Patterson have discussed the stabilized wavelet Mellin transform as a candidate method for vowel normalization. The aim of this paper is to quantify and ex...

متن کامل

An Auditory Vocoder Resynthesis of Speech from an Auditory Mellin Representation

An auditory Mellin transform has been proposed to segregate information about the size and shape of the vocal tract automatically; the process is also independent of glottal pitch. In this paper, we describe a method for resynthesizing speech from the Mellin representation using a high quality vocoder (STRAIGHT), and a nonlinear function to map between the two representations of speech. This en...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Stabilised wavelet mellin transform: an auditory strategy for normalising sound-source size

نویسندگان

چکیده

منابع مشابه

Extracting Size and Shape Information of Sound Source in an Optimal Auditory Processing Model

Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform

Sound resynthesis from Auditory Mellin Image using STRAIGHT

The Perception of Scale in Speech

An Auditory Vocoder Resynthesis of Speech from an Auditory Mellin Representation

عنوان ژورنال:

اشتراک گذاری